irogrums. The 7040 
4 wiili a iniiiimum 
'' .v:i.s also ile-signed 
" immlx-T uf inter- 

■ ■■juT.Htitiij system 
:iii. ii-s. In the early 
,r--ubprngram re f. 

• i;». i.-.'il)h- to load a 
rogram with many 

■ aT.-im linkage uses 
partially coinpen- 

provided, 
pp 'gram and an 

'"!■ a vehicle-dis- 

' t of •■'im.script on 
~ program charac- 

■ rnvido a program 
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An Online Editor 


c i*v ' 7*70: 


■ 2 ' s,> W. I.AM..SOX 

~H.rl.rinj, CnUf„n,i„ 


remTet on" 5 ! 6 ; itoreP T'" 9 “ " 

justification for its form. Emnho.tc .A! ° nd e *P eri ">ental 


computer Storage Je, ° il ' wi,h rem °^5 on the theoretical Tnd 
A-ign tic-d to model * form. Emphasis throughout LlZ^' 

m into which it is ? rovlain 9 maximum convenience an d n , X ' ' S ° n 
‘‘ ^I.MSCRIPT it may ***>* features are its cb.lity to hand le ^ 

and provide more * e cr -ssnf-seorching facility and the h an y P'ece of text, 

: ’ll- moderate sire !di,in . Perations. The editor Can be o araCte, " b y'ch°racter 
ills appear to be P^grommed to a limited 

■ms storage require- 
ion of simulation 
tor designers. 


Introdnct ion 


'7 -my con, 

-.on am,., 

■muter station- 2" l ZZ ZZ 

mriaon of languages. ; 5onn „ <(f ' leptcsentaduns. The most con, 

»-•* — fe, 

it 1 ::::; 1 *r 

;■ 1903 : ,m kr,|,.v t ' “I"' >'"* convenient and 

nree simulation I* 0 " t n*. . ( ,nI f*.\f «*uitiri*£. 

■C). TM-175S/WfW> | '1th the appearance „( online sv , 

[onica, am**- M U’.v, is st ( „,,i however, in 


mguages 


:s. Opcr Ra. j'; e P ll; 

■ / ?«30boi- 


! IICC Biuuuaiw-.. . IV.. 

•C). TM-175S/WfW. I It! I... appearance of online r , 


appearance of 


pto - ,,, 7T l‘ i " R,W ^ o p i 

ion langu % nc) ««l has arisen. .Since the online user 

■° 3969 ’ , :;„ h, r ,ext wc v pt ' h *«* ™*.ie ZZ 

32K) pdBfejjK cut,,,, :.ndn7lifv ZZrT'h ' to allow 

,-t. 1964. oliter and * f< ’ ‘ " ( ' h ;l program Is called 

e-pniped ; "“ J pmvorfuI 

ysis G’mSBBEggmfaln*. ■ 1 "itn disk or drum stoi-ore ti 


vs is Corp.Tjbfg’’, 

langu^^^^ ; 

CompdteB^^^ 




r^e tt|IIi ; ; ; ' Z7J.r ,m i lbly ,a, ' sc an<J 

Pirate k,‘ 7 ni .'’ 1V convenience than even the most 
i^tl ‘ ‘" ld M,m< ‘ 1)1 the techniques which can 

Penm | t am the subject! of this ^ 

l^'Tin,. o,.,,.;," 7 ; <lm,re k, ; ,nv " to ‘he author is not 
i* dt hc.\X ,.<0’7 7 T 70!H ;lt ,>r °jcct MAC 

i7 121 have ~. W( 1 ‘ z vm D, ' vcloi,mo,,t cor po . 

1^- • UK| !lt 1(>;, *t two have been written 

f'oAmreU K^'v" ,hiS ^ • s,l Pportod 
P Un,, " r si7n 5 K, ‘" <V " f rl "' 't-pnnnient uf 


H- KOLLER. Editor 

tor the J7DP-] mi ti, 

script ion of the editor in’^IilSev dm "7 a " JU,,d a de ‘ 

ror ; h0 SDS - y;; t) !o. 4j. which is called QED Zj 
’untie to discuss all the valunhl 2LD ' An attem Pt « 

built into editors for telet' oe Z™ " ^ havo beei ' 
ception of the ^ ex ' 

. formatted Huai documents! SvsfcL 
m nor considered, since mam- „V h • - C R f dls Plnys are 
fQ l are quite different. ' e “ deslgn considerations 

“ CZZ ZT"“ C “ •*» » 

it, pie and mnemonic r<?f ‘ uire ' s a sin ’- 

- r '«* organization which allows ‘thfZ’ Zi ■' niethod of 
d of the structure of his text ratlier rl l:>er f ° th “ lk ‘ n terms 
fixed by the svstem In view 7'" •° ,ne fra mework 

tics of a teletVpe the e -n-I k a, ‘ d cha ™eteris- 

line-oriented system. Hou'evor advantagos t,J a 

the teletvpe makes ir difK t ’ t e physical mechanism of 

• ■««■>, :Zi Z ~, Zl'“ Z " i,h ***** ot«r- 

’ inconvenient. ' 1 algcl un it somewhat 

ft n, - t im - re. 

s "iee till forms of text f-d[ ,,7,? ° haildk ' d b - v the editor, 
vation of this geneS tv • C ^ P ^ 

eriteria, and it" will' I w seett 7nn dm . ,mP "- rt “ t ^ 
that verv little has IO he ' A , dc "cnpu 0 n below 

the eon fc 7 . o r7° «•«* himse,f in tbe ( «t, 

duced. In' it ZZ ^ has b een intro- 

user to nrfetre It" ^ a «<>- the 

has naturally attached to them ^ 7 th ° "’ h ' ch he 
struction. .More general searehe- u“ UKe th ° con ' 
rences of specified srrimrs V u * ' °" klm to bnd occur - 
the user a LZnuZZZZ ^ ^ a,,0 " s 
as he sees fit without comm- ■ leed ° m tn arrange his text 
-Mam- particular r n P ‘ s hls ab 'lity to address it 
the general franmavork. 1 '' be accorT »nodat e d within 

and content addreTiin.'^ifor^h ^ ED 7° llnG organizati °n 
four or five comm-n , l ‘ SOr of the s -«tem. 

ing scheme will provide anmleT eratan . d,I,g ° f the address ‘ 

" ■11. however, 'b l c ZZ Z \ ^ USer 

features, which include m Tf 7? U?6 ° f ^tioiral 

mits character-bv-ch-n-i 1 . > edllm ° mode which per- 
c * cliaiactei editing of *v linn* i rr 

for storing I'recmemlv o-n,t f . ° ' ‘ l llno > f 2 ) buffers 

a substitute command; (4) til -oTiKtvTS s” d ** e,1,lences *' (3) 
mg commands and lafor re7'u ‘ tl Z l * ^ ° f CdiN 
automatic adjustable tab stop! ' d al,tom; “'cally; (5) 
Another important consideration i„ r | u . design of QFD 


been -■ 1 1 ■ [ >tii-i I y ol mipli-mciitntinu. Tin* original version 
in I lit- .'V.'li'in. admittedly wit limit many nf the elaborate 
features. was i It — i i£i u I . written. ami debugged by one mail 
in les- llian a week. anil tin- entire program nmv occupies 
Ii ■: — 1 llian ll’OO wiinl' nf recut rani. null'. 

Basic Editing Oprral inns 

(>EI ) regard' i In’ 'ext mi wiiii’h il is operating as a single 
long -t ri i il; nf I’iiaracl its rnllrit t lit* inn in !rxt linijir. Struc- 
i tire is iinjiii'i'il mi i Ills 'trim; by i In- interpretation nf 
carriage ml urns as iiin: • ic-li mil i*i*s. I. inns nan lir addressed 
by alisnliiln linn nnnilinr. alni lim nharantnrs nil linn n are 
illnsn linlwnnn l tin 'it — 1 st ami 1 1 it- nlli carriage ml urns, 
i I h • 1 1 1 1 1 i n n’ i In* lalliT Inn excluding llm former. The linn 
ninnliiT nf a ; i.ai’t ii-i|i;i r linn may. nf course, change if 
carriage returns are added nr deleted nariinr in l hi; lmlTnr. 
Till".’ ah'i'iiiin linn nnnilinr' are in princijile sufficient, 
together wiih i limn -inipln nnnimanih. fur any edit ini' 
operation. Ail t lit- ■ it her devices fur addressing text are 
syntactically mpiivalnnt In line numbers; i.n., any address 
nail lie repiaend liv the line nunilier nf the line it addresses. 
Il will bn nniivniiiniit in the remainder uf this sent inn to take 
advantage nf this fart and defer discussion of other ad- 
dressing mnehaiii'ins uniil the iiasie eoinmands have been 
presenled. 

The normal s| a p. ,,f the editor is its mm miaul mode. 
Whenever this mode is entered, it prints a carriage return 
and a to indicate its readiness for a command. The 
other modes am Ujct mode and ..din nlil mode; they will be 
explained in t urn. 

All commands take one of the following forms: 
command 

■ address - command 

(address . address < command’ 

Some commands also take additional arguments. The com- 
mand itself is in most eases spneitied by a single letter. The 
time sharing system in which the editor runs allows pro- 
grams to interact with the teletype on a character-bv- 
character basis, anil the QEI) command recognizer makes 
use of this capability to supply the remaining letters of the 
command. This has proved to be a valuable aid to the be- 
ginning and to the occasional user of the editor. An ex- 
pert user can suppress the command completion. 

After a command has been given, it must be confirmed 
by a period. The teletypes used in the system are full du- 
plex. so that the period may be typed in while the com- 
mand is being completed. It is therefore unnecessary for 
the user to synchronize his typing with the computer’s 
responses. 

The three basic editing commands are INSERT, DE- 
LKTK, and PRINT. An insert command has the form 

• 12INSERT. 

The computer generates a carriage return and goes into 
text mode, in which it will accept a string of characters to 
be inserted in the text immediately preceding line 12. The 
text is terminated by a teletype control character, control 


I), which is generated by holding down the CONTROL 
shift key and pushing the "D” key. (Control characters 
appear in boldface type in this paper.) 

The existence of control characters, which do not, with 
a few exceptions, produce any effect, on the teletype, ’jg^ 
it possible for the user to give instructions to the editor 
while he is in text mode without- any escape character con 
vent ion. In addition lo I) for terminating text input, three 
delete characters are available. A deletes t he last character 
typed which has not already been deleted and Q the last 
line. The third delete character, W, deletes a word, which 
is defined as all the immediately preceding blanks and all 
the characters up to the next preceding blank. These 
characters permit the immediate correction of minor errors 
in text input. 

Although the delete characters themselves are nonprint- 
ing, it is desirable that something should appear on the 
paper when they are used, since otherwise it becomes verv 
difficult to keep track of the state of the text being entered 
from the keyboard. It has therefore been arranged that A 
will cause a " t ” to be printed, Q a ■■<—”, and W a “\”. 
This convention has the unfortunate result that text con- 
taining these characters becomes confusing when the 
delete characters are used. No confusion is possible, how- 
ever, when the text is typed out by the editor. Jgfe&K 

Another important feature of QED’s text mode.is the 
tab stop. The user can set tab stops to any positions he 
desires, using the TABS command. For example: 

•TABS. 

.*>. io. io. 20. 

After the command has been given, there are tab stops at 
po.'itions o. 10, Id, and 20 on the line. Thereafter! the 
character I (labeled TAB on the keyboard) will generate 
enough blanks to bring the printing element of the teletype 
to the next tab stop. 

A command complomentarv to INSERT is DELETE, 
which takes the form 

• 12DEI.ETI 

and causes line 12 to be deleted from the text 
form is 

»12,14DELETI 

which causes lines 12, lo, and 14 to be deleted. 

These two commands are sufficient for any edj 
oration. The third basic QED command is P 
may also be given with one or two addresses 

• 12PRINT. 

prints line 12 

»12,14PRINT 

prints lines 12, 13, and 14. \\ hen a line ispr® 
ing characters (those in the teletype type 
in their proper sequence. Nonprinting ch|||| 
control characters, are printed as :Sf |§| WT 

it (let ter corresponding to the control^] 
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-o. editor must also be able to read in data from some 
!' ? mc d ,l, m and write out data for later use These 

anietion.s are provided in QED by READ and WRITE 
(.."imamls, which take the form 

• READ FROM I’ROGl. 


editor. I pp , 

• text mode ls the 
i any positions he 
example: 

•#X 

IP? 

re are tab stops at 
,n. Thereafter, tite 
.ard) will generate 
ent of the teletype 

■'SAT, 

■ RT is DELETE, 

the text. .. 
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or any pd 
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esses.' 
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•WRITE OX PROOF 

Tin- READ command appends the contents of the file to 
:he mam text buffer. The WRITE command mav be pre- 
.eitcd by two fine numbers, in which case only the'speeified 
portion of the text will be written out. 

I- mure I dlust rates the creation and correction of a small 
program using the commands which have just been dc- 
n,ilkw llsc " f the APPEXD command, an 
! - V ': V'. " ch p " tS tho no ' v text s^ter the line addressed, 

in FAD with no argument puts the text at the end 
li the '"iffer. Perusal of this example will suggest the utility 
it a number ot additional conveniences in the editor The 
ample.'' o> these is the CHAXC.E command, which com- 
smes the (unctions of INSERT and DELETE. In fact, 

*1 -’CHANGE, 
s exactly equivalent to 

• • DELETE. 

I *1- INSERT. 

Lite DELETE, CHANGE can be used with two line num- 
j *rs. 1 he number of lines inserted has no relation to the 
\1iumber ol linos deleted. 

4 T "'° n ' im,r Fxtensions of PRINT arc single-character 
iammands to print the next line of text (line feed) and to 
Jjnm t|„. preceding line ( T )• There is also a command 
" !M,S ,lu ‘ ,ext hi pages II inches long; it provides 
■Jp e m :!,l « s and numbers if requested. 

M in th.- original implementation of QED, the main text 
fT " * r " a ' <1 as » string of consecutive characters in 

f*Ty ; i lus simple storage allocation scheme makes it 
V l ” lmpl< ' mont (he commands so far discussed. A dele- 
, for example, is accomplished by moving the character 
mg me deleted section towards the beginning of the 
«er to cover the ones being deleted. See Figure 2. 
sert"..; is slightly more complex. The text to be in- 
f lected m a special storage area. When it has 
conv.'.ie'y typed in, the characters after the point at 

Ifcarrl u '"T 1011 1S '° b ° nuKie aro moved far enough 
^ CMC of the bufTo1 ' t0 make worn for the new 

!() - «mcli is copied into the space created for it. 

« ' hree P° lntcrs t0 th e ‘ext are maintained bv the 
i one ? ,1C th ° bc S 1,lni "g of the buffer, one to the end, 
itofr, 0 f C currc "t litio. This means that no readjust- 
>,Mr< Is rcc inircd by the displacements described 

rfhen 1 1 

f sim i '! n ° Unt ' f es f being edited becomes large, 

13 of P fr . :l * sorbbms bc = bl to become unattractive in 

E . bv r i C ! Cncy - This P roblem c:l " be alleviated, how- 
aiv idiiig the text into artificial pages and leaving 


•APPEND. 

10 REARA f D, 100, N 
SUM = 0 
DO 20 I = 1, 1, N 

D 

•1DELETE. 

•1INSERT. 

10 READ 100, N 

D 

•APPEND. 

-READ 101, X 
20 SUM - SUM Q— 

20 SUM = SUM + N 

WRITE 201, S, W\101, SUM 

100 FORMAT 1 01 ) 

101 FORMAT (F10.5) 

END 

I) 

* “DELETE. 

* “INSERT. 

100 FORMAT (10) 

D 

•3 DELETE. 

•3INSERT. 

DO 20 X = 1, N 1 

D 

•1, 9PRINT. 

10 READ 100, N 
SUM = 0 
DO 20 I = 1, N 
READ 101, X 
20 SUM = SUM + X 
WRITE 101, SUM 

100 FORMAT (16) 

101 FORMAT (F10.5) 

END 

Fig. 1. Example of basic QED commands. Note that control 
characters (in boldface here) do not print anything 


;n 


(a) Main text buffer before deletio 



line 1 line 12 ( oU line 15 ) Unt 37 ( old lino Uo) 

(b) After deletion 


t t t i 


1ft / v . 

' -'timber 12 / December, 1967 


line 1 line 12 line 15 line |, 0 

Fig. 2. Action of the command *12,14DELETE 

a reasonable amount of free space at the end of every pa-e. 
The effect of nearly all the displacements discussed above 
can then be confined to a single page. When large insertions 
or deletions are made it may be necessary to redo the pag- 
ing completely, but this is an infrequent occurrence. Such 
a paging scheme is further recommended by the fact that 
it permits most of the text to be kept out of main memory- 
most of the time. Only one or two pages need to be avail- 
able for any single editing operation. 
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Flbi-iviu-v ran In- l..rll,,r increased. i„ a machine which 
. b.-iMr.-illy wnr,|., „ient,-,l. I,y M-rimt each line in an 

'7' S '"CC l he line always ends wit li 

exact ly ,.„e nrr,a S - .el urn. .he last w„r,l ,-an he filled „ut 

“ '“77' :l ° -'" ll ;ul,llll,,M;l1 ret urns without anv 

."'.-Ml'ilnv ei. illusion being inlrodiieed. This arramn’- 

' n ' : " l . y m,,st ntid all insert inns^or 

deh-lions. S.nee I he lex. ran now he handled a word at a 
"me I' may also he eonvenient lo keep . he number of 
w • >rils m i •aril line at the hc-ginning « »f the lino. 

Ail these improvements have been incorporated in the 
implementation of OKI). The result has been that 
7" M < ‘ ,|l,II| t?"| M '*tttl"US. even oil files of .-(lot- 100 thousand 

characters, can be with less than a tenth of a second 

"l ‘•••nipniatiun. 

As ue have already noted, and as even the trivial exam- 
ple III I- Win- I SUIMOS.S. absolute line numbers are not a 
sttlheieni'v powerful addressing tuechanistn. An attempt to 
cm a 1 000- line program would illustrate this point even 
mere forcibly. It is necessary to be aide to address a line 
h.v Its contents as well as by its location. The simplest wav 
to arrange this is to provide each line with a *rmtence 
number, item-rated either automatically by the editor or 
manually by the user. The lines are kept ordered bv se- 
quence. "timber anil c.-ut be address,-, 1 directly. There are 
two objections to this scheme. 

ill It re, p tires the user to concern himself with an arti- 
hv.'d device which has no relevance to his text but nonet he- 
''7 " llni11 '-' "‘i !'• wasting space and time on output and 
1 1 *« li if *iiilt its usefulness as a ilncu incut. 

n.«*rii..ns and deletions will eventtiallv force re- 
numbering of the lines. When this happens, a complete new 
," n r m ' w 1,0 '''"‘-rated if the septence numbers are to 
be Ot any use. Furthermore, as a result of this process 
numbers do not stay attached to lines. 

A more satisfactory scheme is a more general kind of 
content addressing, [n its simplest form this allows the 
user to refer to the line 


XYZ ADD = 14 

with the address :XYZ:. The meaning of this construction 

IS that the text IS to be searched for a line beginning with 
the diameters inside the colons, with the requirement that 
they be followed by a character which is not a letter or 
digit. A line such ;is 

XYZA SUB = 24 

will therefore not be found. The search begins with the line 
after the one last accessed and continues, cycling to the 
beginning of the buffer if it runs off the end, until a line 
beginning with the specified string is found, or until the 
entire buffer has been scanned. In the latter ease. QED 
pimt> ? and awaits a new command. 

this kind of content addressing, called label addressing, 
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P,,,,v, ; , T lt ,‘ or ma “ v k"»ls of text, including most 4 

grams, ft ,s also possible, however, to search fora line S' 

taming any string of characters in :l „y position by usL 

the construct [(string, where (string- refers to any 5* 

of characters not containing 

I he usefulness of content addresses is enhanced by £ 

tact that they may be followed by integer displacement 

postuve or negative. Thus in Figure 1, the third line coS 

be addressed m any ot the following ways: 

•> 

•) 

d — o 

10 -!> + ■_> 

+ ” d(): refers to line 1 

- ■■'inee :— () : reters to line o 

smeejaiiy line .3 contains the string 

[lOI.SUMj — since only lined contains the string “101, 

The search can be started at any line, rather than at the 
current one, by putting the starting line immediately 
betorc the search construct. Thus in Figure 1, 4[I] would r 
hud line b, as would :2<): [101J. - Fig"" 

Two minor devices offer additional convenience: Ue ‘° 
character . refers to the current line and the character t nth 
? thc l:lst ,inc »> buffer. The “current line” is de- :k 
u.cd according to rigid rules which are set forth In the" T 
listing „f Pride I. The reason for this careful specification Clr( ' 
,s tha / a " , ’ x I )p neiicc,I user of the editor makes frequent A ,(lra:lk 
use of m performing insert and delete operations. If he .; lkrce ' 
cannot bo perfectly sure of its value, he is forced to print I**'* 
tbc lines ho intends to work on before doing the ofiis, f 1 kem - sl ' 
which is very tinio-cnnsuming. hat 1 


1 ABLE I. R v lks for Determining the Value op **•**.. ^. fl . 

’ • '• .y'l nrr\ 

La>t operation performed Value of M ^ 

wl 

>uccessful search Line found ^ 

L nsueccssful search Unchanged . 

Any insertion Last line inserted - is s- 

Any deletion T : j; -- c — * J ' 1 

I > rint or write 


Unchanged 
Last line inserted 


_ ' IS s 

Line preceding first delet^.luse'viV;^^ , 

Last line printed or 


Another very useful convention is that^ggg 
as the argument of a command which is gir«M| 
Thus PRINT, will print the current lme.“Exc e| 
this rule are READ and APPEND, which^sj 
unless told otherwise, and W 

Two minor commands permit an addr^^i^^ 
either as an absolute line number 
bolic form, as the label of the nearest pKe&dmgXB 
does not begin with a blank or asterisk,. follqy--y 
integer displacement f<— command). Thus wit h jil 
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line o 
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new line 




BUcIE OLD FOOLE, UNRULY SUNNE 

CC'C scccccccccsccccz 
v N 

BUSY r i OLD FOOL,% UNRULY SUN 
BUSY OLD FOOL, UNRULY SUN 

WIIY DO YOU TIIUS, 

Z x E D 

o V , ST THOU 

WHY DOTFiTfi^ST THOU THUS 
WHY DOST THOU THUS, 


THRU WINDOWS AND THRU CURTAINS CALL ON US’ 

CCOE Z c C T ‘ 

OL'GIIO E , 

THR'cCOUGH WINDOWEs’ 

THROUGH WINDOWES, ^ ™ RU CUR ™* S CALL ON US ? 

z C Z D 

R O GHN E 

III ROUGH WINDOWES, AND THROUGH CURTAINES C\LL ON T7S> 
THROUGH WINDOWES, AND THROUGH CURTAINES CALL ON US? 

edits. A . in the control character lines is used to indicate carriage return 


ine, rather than at the 
ing line immediately 
i Figure 1, 4 [I] would 

. 

nal convenience.' The 
tine and the character 
ie “current line” is de- 
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Fig. 3. Examples of line edit 


Figure 1 in the buffer, QED would respond to :10:= with 1, 
io :l()l):= with 7, to fi<— with :20:+l, to [SUM+X] — 1<— 
trith '0:-f 3. 


• i uh, ecu iuiuu ui tuc .' ■- 

is careful specificatlSn. nut instances frequently arise in which it is necessary 
'ditor makes frequeoT. ;^ >m * 1 changes to a line already in the text: two or 
ielete operations. If fie M. ee 1 Klr ACtcrs may need to be inserted or deleted. This 
he is forced to print.' .if.! 311 an ' a " hich thc weaknesses of the teletype make 
efore doing the edi^ ' an< ^ a tru b* satisfactory solution can only 

i ul,h :l dis P ,: ‘y llc ' viee 0,1 which the user can point 
T' , ' !l:ir:lctcrs hc ' vishcs to change. There are, unfortu- 
Ti " nlcchs ‘nisnts for addressing characters within a 

Sec. ; teletype which arc not more trouble than thev 

j the Value or worth. J 

• -dw QED <loes, however, contain a character editing mecha- 

| ™ lch Provides many of the features a user might 
This power has been purchased at the cost of con- 
3 ? complexity; although the basic idea of the line 
simple, there is a profusion of commands to speed 
.i.indling of special cases which is somewhat be- 

‘ ‘™ uld *12EDIT. will cause line 12 to be typed 
llov.'.d by a carnage return. The editor is now in its 
1 mo<; le, in which it will recognize a number of tele- 
wtrol characters in addition to the A, Q, W, and D 
ire normally recognized when text is being typed in 
■w characters are interpreted as instructions for the 
n of a new line from the old one which was typed 
»se instructions cause characters to be copied from 
I "'.' mto the nc w one, skipped over without being 
or insertcd - W hen the new line is complete, 
u c the old one, and QED will return to command 
10 slm P le examples in Figure 3 will clarify the 


nserted 

:ding first deletedUtf 
printed or 


comnuinty. 

jtpreceding 

f erisk^f3® 


The first example shows the use of C to copy character 
and S to skip over them. Note that when a character is 
skipped, a is printed so as to keep the new line 
aligned with the old one. When an ordinary character is 
typed in, it replaces the corresponding character in the old 
line. To save repetition of C, a 2 causes the old line to be 
copied up to and including the next occurrence of the follow- 
ing character. Xote that the latter is not printed when it is 
typed in, but when it is reached in the line. This is accom- 
plished by suppressing the echo for the character after the 
Z another application of the full-duplex capabilities of the 
teletype. The result is that the edited line continues to be 
properly aligned with the old one. 

The second example illustrates the use of X to skip to 
the next occurrence of a character; this instruction is ex- 
actly analogous to Z. Also shown is the insertion of charac- 
ters: an E causes ordinary' characters to be inserted rather 
than replace characters already existing. The E causes a 
< to be printed. A second E will switch back to replac- 
ing and print a Insertion of course spoils the align- 
ment. It can be restored with a TYPE instruction (T), 
which types the remainder of the old line and the portion 
of the new line so far constructed, and aligns the ends prop- 
erly. The third example illustrates this process. 

A line edit is usually terminated by a carriage return, 
which suppresses the remainder of the old line, or by a D,’ 
which copies the remainder of the old line into the new 
line. Both are illustrated, in the first and second examples, 
respectively. 

Figure 4 is a list of the control characters recognized in a 
line edit. The ones dealing with tabs are useful for editing 
one field of a fixed-field line. Illegal instructions, such as 
Z followed by a character not in the old line, cause the 
teletype bell to ring and are otherwise ignored. Charac- 
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.SUBSTITUTE :1 AI.PHA ,'OR BETA/ 


conclusion* has brei^^t^ucXfe"^ 0 ^ 011 ^ tentatK - e 
ginal utility, since smalfnro ‘ reS W0U . ld be °f «ar- 
in S.vobol or other strintr-nr^”^ X readiIy be written 
Plish repetitive editing operatTons'^ to acc °m- 

loaded t QED e u n r be f' ^ « -tomatieallv 
the argument of Ia 

of text deleted by a DELETE or'cH-VVGE ' ^ W ° Ck 
buffer 1 (unless it is too hi \ • , ANGE « put into 

EDIT or SUBSTITUTE^’ “ “ X teXt altered by an 
erroneous editing operadon toT 'i This . aI1 °-s an 
amount of work Finally th 6 Undone with a small 
SUBSTITUTE command’ are^n^ ^ # 


■ mnge the first occurrence of “RRT\” • , 

burn r to •■ALPHA". H,/1A m the text 


■ L GOOD MEN... 


miand above are the 
is the delimiter for 
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■S’’, and it terminates 
aind completion had 
Id have appeared as 


i were performed in 
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INS CALL * 
RU/ 


;ough err 


ftring Buffers 

Although QED is not a pro.ri-imm!.,,, t 
have one feature which k g language, it does 

thing', to wS sim X ,rXX '■ ‘^ ibIe - “««»* other 

Ifer can be railed l y the two X ' i,,put Thc b ”E 
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l^ractcin the buffer were being Tvped in oiMhc 
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•Pie functions tote 4°" °f “ IO ° P conunand Permits 
he string Performed repeatedly. For example, 


Re-editing 

d l .nh,4tt b t g t tstot4 t tt C °ttX^ C h gh ' en t0 QED 

told to store. At a later time tv * ° b tbe system ca n be 
fed back to QED as commands 5 Inly 11 ^ retrieved and 
to maintain several sljahtlv Hiff ' f th ‘ S "' ay lt; Is Possible 

te-xt without using up a i t deXo T*** ° f a body of 

bj' keeping one copv of K e 5 Ld SpaC6 ' sim P ly 

taining editing coSman^h^h^ ^ C ° D ' 
stons from the single standard one In addffLt T V61 " 
»>g m space, there is the f,,rtk„ ’ i addltl011 to the sav- 
in the text which are eomm advarlta S e that changes 
only once. Furthermore since "the V6rS1 ° nS " eed be made 
text, they can themselves be modifieT™ rX® S ™ pIy 

they are used to perform the edit Thi • r ED bef ° re 

" ay correcting errors in edting ? S ° m6tlmeS a g0od 

era! St^intTa' tingfetue'inTt'^.T' 4 COmbinin S ®ev- 
dS are ° reduced to INSERT DELETE^ 

ization is not a trivhd one COmb,natlon aad normal- 
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Conclusion 
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on accommodating the system to the needs M the h 6 ® 
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A method for locating specific character strings embedded 
in character text is described and an implementation of this 
method in the form of a compiler is discussed. The compiler 
accepts a regular expression as source language and pro- 
duces an IBM 7094 program as object language. The object 
program then accepts the text to be searched as input and 
produces a signal every time an embedded string in the text 
mctches the given regular expression. Examples, problems, 
and solutions are also presented. 

KEY WORDS AND PHRASES: search, match, regular expression 
CR CATEGORIES: 3.74, 4.49, 5.32 

The Algorithm 

Previous search algorithms involve backtracking when 
a partially successful search path fails. This necessitates 
a lot of storage and bookkeeping, and executes slowly. In 
the regular expression recognition technique described in 
this paper, each character in the text to be searched is 
examined in sequence against a list of all possible current 
characters. During this examination a new list of all 
possible next characters is built. When the end of the 
current list is reached, the new list becomes the current 
list, the next character is obtained, and the process con- 
tinues. In the terms of Brzozowski [1], this algorithm con- 
tinually takes the left derivative of the given regular ex- 
pression with respect to the text to be searched. The 
parallel nature of this algorithm makes it extremely fast. 

The Implementation — 

The specific implementation of this algorithm is a com- 
piler that translates a regular expression into IBM 7094 
code. The compiled code, along with certain runtime 
routines, accepts the text to be searched as input and 
finds all substrings in the text that match the regular 
expression. The compiling phase of the implemention does 
not detract from the overall speed since any search routine 
must translate the input regular expression into some 
sort of machine accessible form. 
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In the compiled code, the lists mentioned in the algo- 
rithm are not characters, but transfer instructions into 
the compiled code. The execution is extremely fast since 
a transfer to the top of the current list automatically 
searches for all possible sequel characters in the regular 
expression. 

This compile-search algorithm is incorporated as the 
context search in a time-sharing text editor. This is by 
no means the only use of such a search routine. For 
example, a variant of this algorithm is used as the symbol 
table search in an assembler. 

It is assumed that the reader is familiar with regular 
expressions [2] and the machine language of the IBM 7094 
computer [3]. 

The Compiler 

The compiler consists of three concurrently r unni ng; 
stages. The first stage is a syntax sieve that allows only 
syntactically correct regular expressions to pass. This 
stage also inserts the operator for juxtaposition of 
regular expressions. The second stage converts the regular 
expression to reverse Polish form. The third stage is the 
object code producer. The first two stages are straight- 
forward and are not discussed. The third stage expects a 
syntactically correct, reverse Polish regular expression. 

The regular expression a(b | c)*d will be carried through 
as an example. This expression is translated into abc | * , d ■ 
by the first two stages. A functional description of the 
third stage of the compiler follows: 

The heart of the third stage is a pushdown stack. Each 
entry in the pushdown stack is a pointer to the compiled 
code of an operand. When a binary operator (“|” or “•”) 
is compiled, the top (most recent) two entries on the stack 
are combined and a resultant pointer for the operation re- 
places the two stack entries. The result of the binary 
operator is then available~as an operand in another opera- 
tion. Similarly, a unary operator (“*”) operates on the top 
entry of the stack and creates an operand to replace that 
entry. When the entire regular expression is compiled, 
there is just one entry in the stack, and that is a pointer to ' 
the code for the regular expression. ' •. 

The compiled code invokes one of two functional rou- 
tines. The first is called NNODE. NNODE matches a 
single character and will be represented by an oval con- 
taining the character that is recognized. The second func- 
tional routine is called CNODE. CNODE will split the 
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current search path. It is represented by © with one input 
path and two output, paths. 

Figure 1 shows the functions of the third stage of the 
compiler in translating the example regular expression. 
The first three characters of the example a, b, c, each 
create a stack entry, .S[fJ, and an X’X'ODE box. 



S(2) — c 



X'X'ODE onto the existing code to produce the final regu-i 
lar expression in the only stack entry. (See Figure 5.) 



integer proe 
the charact 
location of : 
table routin 
assembled 
When thi 
sion, the fo 

CODE 


Km. 1 


Fig. 5 


The next character combines the operands 6 and c 
with a CXODE to form be a# an operand. (See Figure 2.) 



blc 


Km. -1 


The next character operates on the top entry on the 
stack. The closure operator is realized with a CNODE by 
noting the identity A’* = ,\ XX*, where A' is any regular 
expression (operand.) and X is the mill regular expression. 
(See Figure 2.) 



The next character compiles no code, but just 
combines the top two entries on the stack to be executed 
sequentially. The stack now points to the single operand 
a-(fr|c)*. (See Figure 4.) 


SCO) 


Xj>— H 


a— Cb | c) * 

Fig. 4 



The final two characters d ■ compile and connect an 


A working example of the third stage of the compiler 
appears below. It is written in Algol-60 and produces 
object programs in IBM 7094 machine language. 


begin 

integer procedure get character ; code; 

integer procedure instruction(op, address, tag, decrement ); 
code; 

integer procedure value (symbol); code; 
integer procedure index (character); code; 
integer char, Ic, pc; 
integer array stacA’[0:10J, code[0:300]; 
switch switch := alpha, juxta, closure, or, eof ; 

Ic := pc := 0; 
advance : 

char := get character; 
go to swilch[index(char )]; 
al phn : 

code[pc ] := instruction^ Ira’ , value(‘code’)*f pc-f-1, 0, 0); 
code[pc-r 1] := instruction (txV, value(‘faiV) , 1, — char— 1); 
code[pc+ 2] := i nst ruction (‘txh’ , mlue(fail’), 1, —char); 
codc[pc- f-3 j := instruction^ tsx’ , value (‘nnode’) , 4, 0) ; 
stacA:[/c] := pc; * 


pc := pc-f 4; 

Ic := Jc-fl ; 
go to advance; 

juxta : ■Cf.'. 

Ic := Ic— 1; ' 

go to advance; 
closure : 

code[pc ] := instruction^ tsx* , value(‘cnode’), 4, 0); 
code[pc+\] := code[stack[lc— 1]]; 

code[stack[lc — 1]] := instruction (‘ Ira’ , value (‘code’) -{-pc, 0, 0); 


pc := pc+ 2; 
go to advance ; 
or: 

codc[pc\ instruction^ tra’ , value(‘code’)-\-pc-\-i, 0, 0); 


cot/efpc -f-ll 
code[pc-|-2] 
code[pc -f3J 


instruction (‘tsx’, value(‘cnode’) , 4, 0); 
code[stack[lc— 1JJ ; 

:= code[slack[lc— 2)]; '■> 

code[slack[lc— 2]] := instruction^ Ira* , value (‘ code’ )+ pc +1, 0, 0); 
code[stack[lc— 1]] := instruction^ tra* , value(‘code l ) +pc+4, 0, 0); 
pc := pc+4; iil 

Ic := lc—1; 
go to advance; 
eof : 

code[pc ] := instruction (‘ Ira’ , value(‘found’), 0, 0) 
pc :=* pc+1 
end 


... 4 + 




The integer procedure get character returns the next^Jj 
character from the second stage of the compiler. Thel§| 


J ■w-- 
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O+pc+1, 0, 0); 
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integer procedure index returns an integer index to classify 
the "character. The integer procedure value returns the 
location of a named subroutine. It is an assembler symbol 
table routine. The integer procedure instruction returns an 
assembled 7094 instruction. 

When the compiler receives the example regular expres- 
sion, the following 709-1 code is produced: 

CODE TUA CODE+1 0 

TXL FAIL, 1,— a — 1 1 

TXH FAIL, 1, -'a' 2 

TSX NXODE.4 3 

Tit A CODE+16 4 6 

TXL FAIL,!,— 'b'— 1 5 

TXH FAIL,l,-'b' 6 

TSX NXODE.4 7 

TRA CODE+16 8 c 

TXL FAIL,l,-'o'-l 9 

TXH FAIL,l,-'c' 10 

TSX XXODE.4 11 

TRA CODE+16 12 1 

TSX CNODE.4 13 

TRA CODE+9 14 

TRA CODE+5 15 

TSX CNODE, 4 16 

TRA CODE+13 17 

TUA CODE+19 18 -d 

TXL FAIL,l,-'d'-l 19 

TXH FAIL,l,-'d' 20 

TSX NNODE.4 21 

TRA FOUND 22 eof 

Runtime Routines 

During execution of the code produced by the compder, 
two lists (named CLIST and NLIST) are maintained by 
the subroutines CXODE and NNODE. CLIST contains 
a list of TSX «*,2 instructions terminated by a TRA 
XCHG Each TSX represents a partial match of the 
regular expression and the TRA XCHG represents the 
end of the list of possible matches. A call to CXODE from 
location x moves the TRA XCHG instruction down 
one location in CLIST and inserts in its place a TSX 
x+1,2 instruction. Control is then returned to x+2. 
This' effectively branches the current search path. The 
path at x+1 is deferred until later while the branch at 
x+2 is searched immediately. The code for CXODE is as 
follows: 


CNODE 


AXC **,7 
CAL CLIST, 7 
SLW CLIST+1,7 


TSXCMD 
CLIST, 7 


.+1,7,— 1 
CNODE, 7 


TRA 2,4 


TSXCMD TSX 1,2 


CLIST COUNT 

MOVE TRA XCHG 
INSTRUCTION 


INSERT NEW TSX «*,2 
INSTRUCTION 

INCREMENT CLIST 
COUNT 
RETURN 

CONSTANT, NOT 
EXECUTED 


match of the current character. This routine, when called 
from location x, places a TSX x+1,2 in NLIST. It 
then returns to the next instruction in CLIST. This sets 
up the place in CODE to be executed with the next 
character. The code for XXODE is as follows: 

NNODE AXC ...7 NLIST COUNT 
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**,< 

,4 

TSXCMD 
NLIST, 7 

.+1.7.-1 
NNODE ,7 


PLACE NEW TSX «*,2 
INSTRUCTION 

INCREMENT NLIST 
COUNT 


The subroutine NXODE is called after a successful 


The routine FAIL simply returns to the next entry in 
the current list CLIST. 

FAIL TRA 1,2 

The routine XCHG is transferred to when the current 
list is exhausted. This routine copies NLIST onto CLIST, 
appends a TRA XCHG instruction, gets a new character 
in index register one, and transfers to CLIST. The instruc- 
tion TSX CODE, 2 is also executed to start a new 
search of the entire regular expression with each character. 
Thus the regular expression will be found anywhere in the 
text to be searched. Variations can be easily incorporated. 
The code for XCHG is : 

XCHG LAC NNODE, 7 PICK UP NLIST COUNT 

1XC 0.6 PICK UP CLIST COUNT 


SLW CLIST, 6 COPY NLIST ONTO CLIST 

TXI XI ,6,-1 

X2 CLA TRACMD 

SLW CLIST, 6 PUT TRA XCHG AT 

BOTTOM 

SC A CNODE, 6 INITIALIZE CNODE 

COUNT 

SCA NNODE, 0 INITIALIZE NNODE 

COUNT 

TSX GETCHA.4 

PAC ,1 GET NEXT CHARACTER 

TSX CODE, 2 START SEARCH 

j TRA CLIST FINISH SEARCH 

TIUCMD TRA XCHG CONSTANT, NOT 

. EXECUTED 

Initialization is required to set up the initial lists and 
start the first character. _ - ^ -A+- « 

INIT SCA NNODE, 0 ' . 

tra xchg ( 

The routine FOUND is transferred to for each successful 
match of the entire regular expression. There is a one 
character delay between the end of a successful match 
and the transfer to FOUND. The null regular expression 
is found on the first character while one character regular 
expressions are found on the second character. This means 
that an extra (end of file) character must be put through 
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the rode ifi order to obtain complete results. FOUND de- 
not'd! ' <m | T° °f " ,L ‘ seareh routine is therefore 
not uisca-seil m detail. 

The integer procedure C5ETCHA (called from XCHG) 
ohtams the next character from the text to be searched 
I ns diaraeter is right adjusted in the accumulator. 
.d.IClIA must also recognize the end of the text and 
tcrmniiite the search. 

Notes 

Code compiled for will go into a loop due to the 
i losiue operator on an operand containing the null regular 
expie-ion, ,\. 1 here are two ways out of this problem. The 

'' 1S "! not V lll0 ' V SUf '' 1 :m ex Pression to get through the 
■Au.ax sieve. .In most practical applications, this would 
)f a .-< noil.- icstriction. 1 he second way out is to 
miem/e lambda separately in operands and remember 
'll. (D M, location ot ,l,e recognition of lambda. This 
means lhat ,s compiled as a search for Man* . If the 
closiue operation ,s penormed on an operand containing 
lambda. He instruction TRA FAIL is overlaid on that 
portion ot the operand that recognizes lambda. Thus a,* 
i.s compiled as Xanttaa*)* 


if lambda[lc- 1] ^ 0 then 
cod e [lambda[lc-l)\ : = instructionftra’ 
lambda[lc-l\ pc+o; 
pc := pc+tj; 
go to advance-, 


calue(‘fail*) r 0, 


M 


co,le[pc\ : = instruction Clra’, valuet‘code , )+pc +4 0 0V 
code[pc-\-l] := instruction^ tsc’, mlue{‘cnode’), 4 ’ OV 
cotlelpc+2] := code[stuck[lc- 1]]; ’ 

cw/e|pc+3| := codc[stack[lc- 2JJ; 

co<le\stack\lc — JJ : = instruclion(‘lra' , valu e {'code')+pc+ 1 n ni 

if fa'SS-s, 1 . : o = ZT lion<:tra '’ 0 . o)’ 

begin if lambda[lc-l | ^ 0 then 
lamMaUc-2\ = lamb<hi{lc-\\ 
end else 

if lumhila[tc-\] ^ 0 then 

coilc\l(imMa[lc— l|j : = 

instructionftra value(‘code’)+!ambdallc-V 0 OV 
pc := pc+ 4; ’ U; ■ 

Ic := lc— 1 ; 
go to (ulvance; 
eof: 

codefpc] : = instructionftra', value! -found’) 0 OV 
pc := pc- i-1 ’ 

end 


t 


V10U 


1?™ Zl' l nmlj '! n t0 thc tbir(1 stage of the pre- 


T •““at ui me ure- 

compiler. It contains zero if the corresponding 
operand does not contain X. It contains the cole location 
ot the recognition of X if the operand does contain X. (The 
red, location ot the recognition of X can never be zero ) 


in I e*r<T 

prorr 

I ii rr 

:! character : 

code; 

i n 1 ci'cr 

|»ror« 

i\urv 

ins! rue! ion t 

>p, address 

code ; 





i n 1 < 

pro<-«- 

lurr 

abiest/ndud 

code; 

i nl riser 

prortM 

urr in 

•lex' c/utraclcr 

)\ code; 

inlrjsrr 

char, U 

, pc; 


i ii 1 risrr 

array 

s/ack. 

ntMl*la'.Q:\Q\ 

ojilc[0 :?,(}{) 

i 1 <*la 

•ritch ; 

= at pi 

a. jism, f'ln 

lire, or, i iif; 

lc := pc 

:= 0; 





( id ranee : 

r/mr : = tjet characU r ; 
l<> n‘ wtfch[i ndex ( char ) ] ; 

alpha ; 

ipri := instruction-dim’, ialucfcodc’)+pc~ 1 0 OV 
Z + H :== ‘mlructwniltxP, mluci'/aif), l, ’-'char- 1)- 
• ‘" upc f-l instruction i'txh' , vahici'fail"). 1 -char)- 

cndcpc+Z] := instruction,-/*,-, ralucfnnodc’), 4 0)- ’ 

x/nckllc] := pc; ’ ’ 

Uunl»/u\lc] := 0; 
pc := pr.+ i; 
lc := l c + 1; 
go to advance ; 
juxta : 

if lamlnla\lc-\\ = 0 then 
lainbftu\lc— 2J := 0; 
lc := lc— 1; 
go to advance; 
closure : 

co,lc\pc\ := instructional*!-, mluefcnode’) 4 0V 
cale{pc+ 1J := cmlc[slack\lc-l\]-, 

c<xfc[pc+2] := instruction flra', mluefcode’)+pc+6 0 0)- 

codc[pc+3] := instruction flex’, mluefcnode’) 4 0)-‘ ’ ’ 

c«fe[pc+41 : = codelslackllc-1]); ’ 

codelpc+5] := instruction flra’, valuefcnde')+pc+(, 0 OV 
codMUc-m := ,W wtfm( . tnl / wIu ,S^ 0 45;' 0f 0); 

4-- Com in nni cut ions of tile ACM 


The next note on the implementation is that the sizes 
of the two rim time lists can grow quite large. For example 
the expression a*a*a*a*a*a* explodes when it encounters’ 

I’tTh 00 ^ 1 ' 1 ''' 0 ^ “/V Th ' S exprcssion is equivalent to a* 
and therefore should not generate so many entries. Such 
redtmdant searches can be easily terminated bv having 
NNODF (CAODE) search XLIST (CLIST) for a match- 
ing entry before it puts an entry in the list. This now <ri V es 
a maximum size on the number of entries that can be in the 

rr tsJ C T aX ' mUm ni,nibcr of entr * es that can be in 

“| thc number of TSX CXODE.4 and TSX 
ANODE,! instructions compiled. The maximum num- 

YvnnT™!"^ m XbIST 1S just the number of TSX 
ANODE, 4 instructions compiled. In practice, these 
maxima, are never met. f 

The execution is so fast, that any other recognition and I 
deleting of redundant searches, such as described bv Kuno 

and Oettinger [4], would probably waste time. " V# 

This compiling scheme is very amenable to the extension vf 
of the regular expressions recognized. Special characters f : 
can be introduced to match special situations or sequences. V 
Examples include: beginning of line character, end of line if 
character, any character, alphabetic character, any num- 'Jf 
ber of spaces character, lambda, etc. It is also easy toff 
incorporate new operators in the regular expression rou- P 
tine. Examples include: not, exclusive or, intersection, etc.J§ 
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